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ABSTRACT 



A great deal of research has been conaucted in the past 
20 years concerning the use of voice recognition eguipiaent 
with computers. The goal of this research has been to 
improve the man-machine interface. With the breakthrough 
from discrete to continuous voice recognition technology in 
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I* InIIOduction 

As cccputer science has advanced into the era of huran 
interactive design, cne technology which has received 
increasing attention and has already demonstrated many prac 
tical results is that of speech recognition. It has the 
potential to vastly change the state of man-machine interac 
tion by allowing humans to use their most natural 
communications output mode, speech, and thus freeing them 
from the constraints of the keyboard. This tnesis is 
concerned witn the application of speech recognition tech- 
nology to military wargaming, specifically a computer-aided 
simulation of the naval warfare environment Known as the 
Haval Warfare Interactive Simulation System (NWISS) . 

Because computer-aided wargames entail a very iiigh degree o 
human interaction, they are excellent candidates for appli- 
cation of voice recognition tecnnology. The remainder of 
this chapter will cover what speecn recognition technology 
is and can do, describe a specific implementation of this 
technology in a product named the Verbex 3000, introduce 
IIV.ISS in more detail and close with a summary of the thesis 
ob j ec tives. 

A. REVIEW OF VOICE RECOGNITION TECHNOLOGY 

1 . General 

In the discussion of automatic speech recognition, 
the distinction between recognition and understanding is 
sometimes unclear. W.A. Lea [Ref. 1 : p. 40] has defined 
voice recognition by machine "generally as the process of 
transforming the continuous human acoustic signal into 
discrete representations which may be assigned proper 
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meanings and which may be comprehended to affect responsive 
behavior." For the purposes of this thesis, the process 
will be understood as the conversion of human speech into 
recognizable text, i.e. words and symbols. Due to idiosync- 
racies of the human voice, as exhibited in such individual 
variations as sex, race, geographic origin, age, emotion and 
numerous other factors which impact the human acoustic 
signal, this is by no means a simple task. Machine under- 
standing of speech, on the other hand, is a closely related 
activity which follows recognition and applies artificial 
intelligence to invoke parsing rules and to make logical 
inferences from the semantic content of the spoxen (recog- 
nized) message. This task is also very difficult as humans 
know from daily experience (not sc with recognition which is 
generally taken for granted) . iihiie speech recognition and 
understanding are not separate functions for the human (we 
often use semantics to reco nstruct/co mplete a sentence we 
did not fully hear or listen to) , they have tended as 
computer technologies to develop in a parallel but partly 
separated fashion. [Ref. 2] 

It has long been recognized that speech is the 
human's highest capacity, most natural output communications 
channel. However it has only been during the past thirty 
years tnat it has been possible to create machines which 
begin to take advantage of this fact. In terms of human 
input to computers, the keyboard nas had superior speed, 
error correction capability and overall versatility. How 
long the keyboard will retain this superiority is open to 
debate. Already commercial word recognizers have been 
effectively used for such jobs as package-sorting systems 
and inspection and quality control, where Keyboards or 
numeric keypads served before. Military uses include 
cartography, computer-assisted training of air traffic 
controllers and aircraft cockpit communications. 

[ Ref . 3 ; p. 28 3 
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These uses of speech recognition technology nave 
benefited froa several advantageous properties inherent in 
speech input to aiachines in addition to high speed and natu- 
ralness. Automatic speech recognition is unigue in its 
ability to free the user's mind and eyes for such tasks as 
viewing graphics screens or other decision aids in a command 
post, overseeing an operations center, or just reading from 
a data source without having to remove the eyes to find and 
ensure the correct key is being struck. While skilled 
typists can read from a data source and input data at a 
rapid rate, such proficiency is not achieved without a good 
deal of training. 3y comparison training in the use of 
voice recognition equipment can be minimau.. £Ref. 4] 

A further advantage of automatic speech recognition 
is mobility. With a lightweight, wireless, microphone head- 
set, a person is free to roam about and attend to other 
duties, such as an air traffic controller monitoring radar 
screens and speaking simultaneously to pilots and machine, 
eitner fcr transcript purposes or to control navigation and 
landing aids. Finally machine access can be controlled 
tnrough spoken codeword authentication using voice recogni- 
tion in ccmtinarion wirh a speaker verification system. 

[Ref. 4] 



2- Co n tin UP us Vcice Re c og nit io n 

Historically, almost all commercial applications of 
voice recognition technology have fallen in the category 
known as isclated (discrete) word recognizers. Typically 
this class of speech recognizers has demonstrated the 
ability to recognize limited vocabularies (up to 300 words) 
where tne speaker is required to pause perceptibly between 
each word or utterance (string of words constrained to a 
specific timeframe such as 1.5 or 2.0 seconds). The pauses 
provide boundaries for the machine processing of the voice 



message and allow the machine to "eaten up" to the speaker. 
Discrete speech recognition detracts from the naturalness of 
human speech and imposes an artificial constraint on the 
speaker which requires training to adapt to such a speaking 
mode. 

Sparked by the five year, $15 million research 
effort of the Department of Defense's Advanced Research 
Projects Agency in the mid- 1970' s known as Speech Under- 
standing Research (AEPA S'JR), the recognition of continuous 
speech was proven possible in the laooratory. As other 
advances in microcomputer and memory technologies came 
about, the first commercial continuous voice recognition 
products have come on the market in the past three years, 
labile spea ker- d epend ence and limited vocabulary (300 or so 
words) are still the rule, clearly naturalness is enhanced 
as well as input rate when continuous speech is used. "One 
can continuously speak at a rate of about 150 to 300 words 
or more per minute, but when words must be individually 
separated by pauses, the rate dro^s to less than 125 
(usually to around 50 to 30) words per minute." 

[Ref. 1: p. 66] 

In general, continuous voice recognizers rely on the 
definition of grammars to limit the number of words from 
which the machine must choose at any instant based on previ- 
ously recognized words. A grammar is a representation of 
tne allowable word sequences in a state diagram composed of 
nodes representing a word or group of words and the possible 
transitions between the nodes. While it has been shown tnat 
finite state grammars cannot properly cnaracterize major 
subsets of English sentences, unless sentence complexity is 
severely limited, they are quite appropriate for applica- 
tions involving strictly-formatted sequences of words. 

[ Ref . 1 : p. 52 ] 
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B. VEBBEX 3000 SPEECB APPLICATION DEVELOPilENT SYSTEM 
(SPADS) 

One product in particular which resulted froa continuous 
voice work of the 1970*s and which is the basis for this 
thesis is the Vertex 3000 voice terminal, marketed by a 
subsidiary of Exxon Corporation. The Vertex 3000 [Eef. 5: 
p. 8] "is a continuous speech, voice data entry terminal. 

It can operate as an input/output peripheral that adds voice 
entry capability to ether computer systems or, in some 
applications, as a stand-alone data handling system. It is 
designed for use in industrial and commercial environemnts 
with either high or low noise levels, and allows operators 
to input data and commands in a naturally spoken stream of 
numbers, words, or phrases, without pausing." With its 
maximum number of four speech processing noards, the Verbex 
3000 can recognize up to 360 different words spread over as 
many as 20 grammars. ^ A finite limit to grammar size, based 
on total number of words and complexity of the node tran- 
sition network, is necessary to allow the device to remain 
"real time" in terms of computation speed and memory (stored 
voice patterns) reguirement s. Thus the total application 
may involve up to 360 words, but at any instant the recog- 
nizer is dealing with only a subset (grammar) . 

The Speech Application Development System (SPADS) is a 
hardware/software adjunct to the Verbex 3000 voice terminal 
which allows the user to program the voice terminal to run a 
particular application. In other words, the design and 
definition of grammars, the control of transitions between 
grammars, the processing of text output, and the design of 
the terminal’s visual and aural interface (feedbacx) with 



iThis information was obtained at a SPADS training 
course attended bv the author and conducted by Mr. Thomas 
DiGennaro, Vertex* Senior Software Engineer, 7*- 9 November 
1933. Future references to tnis source will simply be 
denoted "SPADS Training Course". 
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the operator are acccaplish ed through SPADS. Verbex has 
designed SPADS so that Verbex 3000 users can write their own 
applications and update then as required vice purchasing 
customer engineering services from Verbex. Currently in 
beta test status (as delivered to UPS), the SPADS is 
intended to be user friendly such that the building of 
applications and granmars is done through menu-driven 
editors. A procedure must be written in the Pascal program- 
□ing language to control the application. Verbex supplies 
about 20 predefined functions to ease thus process. 

C, BAVAL HABFAE2 INTERACTIVE SIMULATION SYSTEM (NWISS) 

As noted earlier, the candidate system for application 
of continuous voice recognition technology was the NWISS 
developed at the Battle Group Training Computer Support 
Facility within tne Naval Ocean Systems Center at San Diego. 
Per the NWISS user’s manual, NWISS "is a real-time, man- 
interactive, discrete event, time step computer-aided 
simulation of the naval warfare environment. . . .for the 
purpose of supporting the training of senior naval officers 
in force-level tactical decision making and management in 
command and control." 

In addition to NOSC, the NWISS is resident at the NPS 
Wargaming Analysis and Research laboratory where it is used 
primarily to introduce wargaming to students and to expose 
them to tactical, force-level decision making problems in 
command and control. The NWISS supports a two-sided (Blue 
vs. Grange) scenario in wnich opposing sides can define, 
structure, and dynamically control forces with the support 
of an umpire-like function called Control. Normally, at 
NPS, the force building and structuring phases are accom- 
plished by the instructor and the students begin the wargame 
phase with a predefined scenario and force structure. (The 
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most often used (at NPS) set of scenarios is based on opera- 
tions in the Sea of Japan. Since the author’s HWISS experi- 
ence was as a Blue player, this thesis uses the Blue force 
levels and names from the Sea of Japan scenario set.) 

During the wargame phase the two sides must position and 
equip their forces and sensors to be ready for whatever tie 
scenario entails, staying within tne rules of engagement, 
and to engage the opposing forces in combat when appropri- 
ate. The NlvISS can support various "views" of the action, 
representing the current tactical situation icnown to a user 
through the various sensors organic to his controlled force 
elements. Thus the Blue side may for example consist of 
several views representing different warfare commanders. 

Typically a player has at his disposal an alphanumeric 
display capable of showing various information status 
boards, a color graphics display showing force positions 
(with Naval Tactical Data System symbology) and sensor 
information superimposed over a map of the area of opera- 
tion, and an alphanumeric terminal for entering the player 
commands. Via keyboard, the player nay enter strictly- 
formatted commands to change the graphics display 
characteristics, to equip and move forces, to control 
sensors, to engage enemy forces and, in general, to command 
and control the battle from his vrew. The alphanumeric 
terminal can also be used to send and receive messages 
between players (views) of the same side and/or Control. 

The purpose of this brief description of NNISS is to 
provide a sense of the system’s overall capability and, more 
importantly, to emphasise that, as currently configured at 
NPS (but not NOSC) , all input to the NNIS^ f£om its pla yers 
is via ke yboard. Although a fair degree of user friendli- 
ness has been incorporated such that prompts and help for 
entering the next field of a command are readily provided 
and only enough characters need be typed to guarantee 
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uni>£ ueness, aad errors are easily corrected, command entry 
can still be a laborious, error prone, time consuming task 
for many users. Precise entry of force names, weapon iden- 
tifiers, latitude-longitude, bearings, ranges, altitudes, et 
cetera is required and errors are not always easily forgiven 
by 2IWISS. 



D. POBPCSE 

1 • h Past Study ^ Voice Becogni ti on Appl ie d to 

Mar gam ing 

The predecessor to NMISS at NPS was the warfare 
Environmental Simulator (MES), also developed at MOSC, with 
many of the same capabilities as NMISS but to a lesser 
degree (particularly in system response time). In 1981, M. 
J. -IcSorley published his masrer's thesis at NPS on the 
subject of using voice recognition technology to run MES. 
Osing a d iscre te voice recognizer (Threshold 600), he 
compiled a set of typical MES commands and conducted an 
experiment with 12 subjects of varied typing abilities and 
voice (microphone) experience to determine which input 
medium was superior. Based on lengthy statistical analysis 
of speed and error results, McSorley [Hef. 6; p.65] 
concluded that "the subjects were able to input MES commands 
faster and with fewer total errors using the manual typing 
mode than with voice mode," Since McSorley 's subjects were 
using discrete voice and had an average typing ability 
better than 35 words per minute, the results are not too 
surprising. 

2 . The sis 0b1ec fives 

One reason for summarizing McSorley's work is to 
show that interest in applying voice recognition technology 
to computer-aided warcames is not new. Such wargames are 
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highly interactive and hence invite attempts to improve the 
interaction (and the participants are usually willing 
subjects in new or experimental undertakings) . Given the 
inherent advantages to voice input stated earlier, it seems 
only natural to want to apply speech recognition technology 
to a wargame such as NWISS. 

The more cogent reason for reporting dcSorley’s 
results is to establish the grounds for this thesis: to 

make use of the progression from discrete to continuous 
voice recognition technology and build a voice input medium 
to NWISS which has the potential to compete effectively with 
the keyboard both in speed and error results. with such an 
input medium, NWISS players will be able to spend their time 
more profitably in monitoring the graphics display and 
status boards and in commanding and controlling their forces 
with more natural voice commands as opposed to being tied to 
a keyboard. However this application of continuous voice 
recognition technology is not intended to, nor can it, 
completely replace the player’s alphanumeric keyboard. 

Rather it is intended to substantially improve the man- 
macnine interface for NWISS and allow the player to perform 
all but a very small part of his input with voice vice the 
keyboard . 

Given the commercial state of the art (as repre- 
sented by the Vertex 3000), the challenge is to thoroughly 
scrutinize the subject application (NWISS) and so design 
grammars and a grammar transition network in software such 
that grammar boundaries (which tend to be "discrete") occur 
in places where natural pauses occur or where the user can 
be easily induced to pause with minimal disruption to speech 
patterns. 

Thus the purpose of this thesis is to show that 
continuous voice recognition technology can be effectively 
applied to a computer-aided wargame tr.rougn demonstration of 
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the software design fiocess (Chapter 2) and the user's oper- 
ating insrr actions (Chapter 3) . To be precise, the li'TlSS 
Blue player comnands, force names, weapon identifiers, et 
cetera for the Sea of Japan scenario (totalling about 150 
words or symbols) have been placed into 10 different (but 
overlapping) grammars in an application whicn uses about 800 
lines of Pascal to control the networK flow between grammars 
and restructure textual output to NKIS3 requirements. The 
application is designed to be user friendly and require a 
minimum of player involvement with the mechanics of the 
process . 
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II. SOFTWARE DESIGN 



As noted in Chapter 1, the api^lication of continuous 
voice recognition technology requires careful study of the 
nan-machine interactive process being substituted for or 
supported. In the case of NWISS, such questions as the 
following must be answered: 

• Where will natural input pauses occur? 

• What feedbaclc or prompting will tne user require? 

• What means must re provided for error control? 

• How must the data be structured for the host computer 
and what special data characters, if any, are needed? 

• How will the overall process be controlled, particu- 
larly in the time domain? 

The answers to these and other related questions lead to the 
inxtial design of the grammars and the application control 
program. Tnis chapter will define the MWISS requirements in 
terms of input command syntax and data structures, describe 
the grammars which resulted from the design process, and 
provide some insights to the structure of the application 
program code. 

A. NHISS BEQtJIREaENTS 
1 . Com maqd Sy nt ax 

There are approximately 37 commands which the NWISS 
player can issue to forces under his control for maneuvering 
and launching platforms, controlling sensors, and engagement 
[Ref. 7]. In this paper these are referred to as ”rCR" 
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commands of which about 23 are used with any frequency at 
NPS and have been included in the continuous voice recogni- 
tion application (see Table I) , There are approximately 
another 20 commands an NWISS player can use to control the 
graphics display characteristics, obtain bearings and accom- 
plish ether actions [Sef. 7]. Again these 20 commands are 
not all used at NPS and so the 10 commands used most often 
have been included in the voice application and are 
addressed later. The primary reasons for excluding unused 
commands are grammar and application code efficiency and 
reduced user training time; these design factors and others 
will be addressed throughout this chapter. What follows now 
is a description of the general syntax and some examples of 
these commands. 



TABLE I 

Permitted NiISS "FOR” Commands 



ALTITUDE 


DECM 


MISSION 


SPEED 


BARRIER 


DEPTH 


ORDERS 


STATION 


EINGC 


EMCON 


PERISCOPE 


SURFACE 


BLIP 


FIRE 


PROCEED 


TAKE 


COURSE 


LAUNCH 


P.30C 


WEAPONS 


COVER 


LOAD 


REFUEL 





a. "FOE" Commands 

The primary NWI SS 
and consists of the following 

FCE <addr8ssse> <command> 



command syntax is quite simple 
standard format; 

[TIilE <start-minute> ] <CH> 
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where the following conventions apply: 

1. Capitalized words are command keywords which may be 
entered in abbreviated form (enough letters to insure 
uniqueness) . 

2. Lowercase words inside parentheses are prompts from 
NKISS received when the space or escape character is 
struck after the preceding field. 

3. Lowercase words inside arrows are command arguments 
which must be specified precisely. 

4. .Keywords and arguments inside brackets are optional. 

5. "FOE <addressee>" is not required for subsequent 
commands directed to the same addressee. 

Some specific command formats and examples are shown in 




FOE <addressee> ALTITUDE <feet> 
"FCF. VA024 ALTITUDF 4000" 



FOE <addressee> CCVSE <track #> 

"FCE MP604 COVES ES002" 

FOE <addressee> PEOCEED POSITION <latitude> 

<longitude> 

"FOR KilOX PROCEED POSITION 36-jON 134-55E" 

FCE <addressee> FIES <number> <name> TORPEDO (at) 

<track #> 

"FOE CNAHA FIRE 2 NK48 TORPEDO (at) 3S006" 
-I 



Figure 2.1 Examples of "FOE" Command Syntax. 



Figure 2.1 where <CE>, carriage-return, is assumed and 
"TI!1E" is net used. As convention #4 implies, the keyword 
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"TIME" and its following argument are optional: most player 
commands are entered without specifying a desired game time 
so that execution by NWISS occurs in real time. 

The "addressee" and "force-name" fields have the 
same domain, i.e. all legitimate force-names are specified 
in the N'^'ISS force- building and scenario definition phases. 
For the Sea of Japan scenarios, the addressable Blue forces 
number 9 ships, 1 shorebase and potentially 100 aircraft 
(aircraft do not have callsigns assigned and hence are not 
addressable until launched) . aircraft callsigns consist of 
2 alphabetic characters followed by 3 digits whereas ship- 
names and shorebases may be abbreviated to tne first 5 
characters, e. g. HATHEourne. In addition, tash force desig- 
nators may be used to address collective segments of the 
Blue forces or individual units (not aircraft). Thus "FOR 
1.1" catches the entire Blue task group including all 
aircraft while "FOR 1.1. 0.0" catches the Kittyhawx but not 
her aircraft. Other than "1.1", tnese designators are 
seldom if ever used and are in a sense redundant. For that 
reason plus a technical problem with defining periods as 
part of object names in the Verbex 3000, only "1.1" is 
permitted in this application. See Table II for all allow- 
able force-names. 

b. "LADNCH" Command 

The longest and most difficult "FOR" command to 
learn how to enter properly to NWISS is that for launching 
aircraft. Ability to correctly launch aircraft is of course 
indispensable to game play. To avoid too much complication 
and be in conformity with how the command is most often 
used, its simplest syntax will be discussed and is shown 
below : 
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TABLE II 

Sea of Japan Scenario Blue Force-naaes 



Shipg 


Aircraft 


Shorebase 


KITTY hawk 


MP600-608 


HISAWa 


KNOX 


SH 100-103 




LOSA Ngeles 


VAOOO-033 


Task Group 


MCCO Fmick 


VEOOO-003 


1. 1 


OMAHA 


VE 100 




RATH Bourne 


VFOOO-019 




SPEOAnce 


VH 000-005 




NICHIta 


VKOOO-003 




HILSOn 


VS 000-009 
VTOOO-003 
VW 000-003 





FOR <addressee> LAUNCH <nunber> <aircraf t-type> (event name) 
<naice> (course) <degrees> (speed) <knots> (altitude) <£eet> 

Upon accepting this "first- level" command, NWISS responds 
with the prompt "FLT PLAN:" on a new line. In theory the 
user may now specify any of ahout 25 commands applicable to 
aircraft. However the normal NPS practice is to provide a 
"MISSION" for the aircraft, "LOAD" the aircraft with expen- 
dables, possibly specify a "PROCEED POSITION", and signify 
the completion of the launch command witn "STOP". Except 
for "STOP", any of the other commands is followed by the 
prompt "FLT PLAN:" on a new line. Figure 2.2 shows a 
complete launch command. 
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"FOR 

21 

FIT 

FIT 

FIT 

FIT 



KITTY LAUNCH 6 Fl4A (event name) F14A1 
0 (speed) 650 (altitude) 2000" 

PLAN: "ilISSION CAP" 

PLAN: "PROCEED POSITION 36-30N 137-45S" 
PLAN: "LOAD (eguipment) 2 PHENX 2 SPAR 2 
PLAN: "STOP" 



(course) 



SWDE" 



Figure 2.2 Exanple of Complete Launch Command. 

c. Graphics Display and Other Commands 

In addition to the "FOR" commands, there is a 
large repertoire of commands for controlling the character- 
istics of the NKISS graphics output. In general, the 
geographical area of operation can be centered about any 
force-name, track, or position specified and its radius can 
be made as small or as large as desired. An xmark, circle 
or grid (set of concentric circles plus 12 lines of bearing 
spread 30 degrees apart) can be placed over any force, track 
or position. NWISS generated lines of oearing (LOB) for 
passive sonar or ESM (electronic sensor) may be erased or 
turned back on. Finally there are other non-graphics 
commands for executing a preplanned launch (the Sea of Japan 
scenario provides five "canned" Blue launch plans which 
allow the player to get many aircraft up at once), or 
obtaining bearing and range information, or overrriding the 
NNISS generated NIDS assignments for friendly, neutral or 
enemy platforms. Figure 2.3 shows examples of some of these 
commands’ syntax together with an example. 
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PLACE (a) CIRCLE (around) FORCE <f orce- name> 

GRID TRACK <track #> 

POSITION <lat> <long> 

(radius) <nautical-miles> 

"PLACE (a) CIRCLE (around) FORCE I'l?604 (radius) 60" 



CENTER (plot at) FORCE < f orce-na me> 

TRACK < track #> 

POSITION <latitude> <iongitude> 
"CENTER (plot at) FORCE KITTY" 



<CTRL-F> <plan-name> 
"<CTEL-F> F14STECAP-PRE" 



BEARING (and range from) FORCE <force-name> 

TRACK <track #> 

POSITION <lat> <long> 

(to) FORCE <forc€-name> 

TRACK <track S> 

POSITION <lat> <long> 

"BEARING (and range from) FORCE KNOX (to) TRACK 3S004" 



DESIGNATE (as) FRIENDLY <track 

NEUTRAL 

ENEi^Y 

"DESIGNATE (as) ENEMY 3U007" 



Figure 2.3 Graphics Display and Other Player Commands. 

2. NilISS Da^ R equire s e nt s 

As shown in Figure 2.3, execution of preplanned 
launches is accomplished with a "control-i" followed by the 
plan name. Of greater importance is the "control-k" char- 
acter with which the NXISS player may cancel any command 
prior to its complete entry and acceptance by N57ISS. There 
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are word and character erase functions also in SWISS but 
they are not pertinent to a continuous voice application 
which outputs buffered strings of data vice keyboard char- 
acter by character output. This means that the output from 
the voice recognizer must be at least syntactically correct 
and thus in conformity with the syntax examples shown above. 
In general output strings must have spatial separation of 
command keywords and arguments and spaces must be kept out 
of digit strings which are meant to be contiguous since 
NWISS can interpret intervening spaces as completion of the 
field (e.g. launching an FI 4 at 5 vice 5000 feet). Another 
requirement is to signify completion of command entry by a 
<CR> (carriage return). 

B. GRAaHAH DESIGN 
1 . Ove rvi ew 

As noted earlier, ten grammars have been defined for 
the NWISS continuous speech application. While the 
strictly-formatted structure of the NWIS5 commands combined 
with the Verbex upper limit on grammar size helps to deter- 
mine the overall grammar design, there are numerous factors 
to consider in building the software (both grammars and 
Pascal procedure) for any application. According to Verbex 
[Ref. 8; p.60], the grammar design goals are: 

• to improve recognition accuracy 

• to allow for feedback 

• to allow continuity of speecn 

• to allow natural pauses 

• to reduce response time 

• to allow error correction 
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• to reduce storage requirements 

As with most sets of goals, there are some incompatibilities 
among them and tradeoff decisions must be made. This 
chapter section will briefly describe the Verbex tools used 
to define grammars, describe the major grammars built for 
NWISS in detail, and, wherever appropriate, discuss grammar 
design goals and tradeoffs vis-a-vis tne NICISS application. 

2 • Ve rbex Gramm ar Desi on Tools 

a. Verbex Standard Notation (7SN) 

In a takeoff on Backus-Naur Form (BNF) , Verbex 
has created a very logical and understandable means for 
defining grammars which must first be described in order to 
define the NWISS grammars. The basic element of VSN is the 
object, which may be eitner simple or complex. A simple 
object is a word that the user actually speaxs. A compound 
object is a category, or group of objects, and is so denoted 
by placing a period in front of it. Note that a compound 
object can represent a group of compound objects as well as 
simple objects. ;^ithin a compound object definition, alter- 
native objects are arranged vertically and consecutive 
objects are arranged horizontally. Thus to define a 
compound object which is used in every NNISS grammar, we 
write 



. digi t 



0 

1 

2 

3 

4 

5 

6 

7 

8 
9 



to represent the fact that any 
spoken wherever .digit appears 
object or grammar definition. 



one of the ten digits may be 
in a higher level compound 
Thus 
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cours 



digi t 



could suffice as 
used freguently 



cou 


rse 


may be 


sp 


def 


ines 


option 


al 


def 


init 


ion of 


CO 






• 


CO 


to 


alio 


w such 


va 


"co 


urse 


" impli 


es 


con 


sider the a 


bo 


dig 


it p 


lus a h 


al 


kno 


W th 


is so t 


h a 


dig 


its. 


it is 


sp 


The 


tradeoff h 


er 


dig 


its 


a Iways 


(i 


tim 


eout 


thr eshh o 


3 d 


igit 


s) or w 


he 


dig 


its 


with no 


i 






Fi 


na 


repetition of 


th 


NKI 


SS, 


dircraf 


t 


len 


gt h. 


Thus 


t h 



accomplishes the 
eted, identical 
tine constraint 
no intermittent 
speak digits con 
of 258 digits2 y 



e ::= .digit .digit . 



the VSN 


definition of a 


compound 


object 


in NHISS 


. It ma y 


be noted that sometimes 


ecif ied 


with only 


1 or 2 


digits. 


VSN 


o b j € ct s 


with brackets. 


Hence a 


bet ter 


urse might be 








urse :: = 


. d igit [ . 


digit 3 


[ . digit ] 




riation. 


H owever 


this d^ 


ef ini tion 


of 



a time constraint: the Verbex 3000 will 
ve string to be complete after the first 
f-seccnd of silence. Tnus the user needs to 
t, if a course is to be specified with 3 
oken as a continuous string without pauses, 
e is whether users must enter ’’course" as 3 
n which case the Verbex 3000 waits until its 
Id, approximately one minute, to receive all 
ther they can enter "course" as 1, 2, or 3 
ntermittent pauses. 

lly VSN allows an unspecified amount of 
e same object through use of the "+". In 
altitudes can vary by several digits in 
e following definition 

altitude ::= .digits 

same task as several consecutive, brack- 
objects. Again, however, there is the same 
imposed on the user as above with brackets: 
pauses. On the other hand, a user could 
tinuously up to the Verbex 3000 buffer limit 
ith the above definition. 



2SPAES Training Course 
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According to Verbex [Ref. 8; p. 64], "a grannar 
is complete when every compound object included in its defi- 
nition has been completely defined in terms of simple 
objects. There is no limit to the numner of levels this 
definition can take, nor on the scope of complexity of any 
level." To see hov; this definition applies, the VSH defini- 
tion of the NWISS grammar "position" is shown in Figure 2.4. 
This grammar is confusing at first glance since it seems to 
provide for botn latitude and longitude but without speci- 
fying wnich. That is precisely the job of tne application 
code: it wcrks in cohesion with the grammar design and 

calls Position twice for recognition, tne first time with 
the prompt "LATITUDE" and the second time with the prompt 
"LONGITUDE". Here grammar efficiency is achieved because 
the application code takes advantage of the natural pause 
which occurs between latitude and longitude expressions. 



Position ::= *1] .digit .digit [•oi^^tes] .direction 
CONTROL K 



.minutes 
. dir ec tion 



. 6d igit 



- . 6 digit . di git 

N . digit 

S 

V 

N 

0 

1 

2 

3 

4 

5 



0 

1 

2 

3 

4 

5 

6 

7 

8 
9 



Figure 2-4 N5JISS Position Grammar. 



29 



with that understanding, we may note that 
Position has an optional complex object, .minutes, which the 
user is required to begin with the sentinel "dash'*. Since 
SWISS requires the dash from the keyboard as well, this dees 
not seem burdensome. Less complexity and greater accuracy 
are achieved with the object .6digit defined as part of 
.minutes. 

Note also that CONIROL_K is an alternative in 
tnis grammar as well as all other SWISS grammars: the user 

may cancel the SWISS command at any point. More rationale 
for the CONIROL_K will appear in the discussion of the 
grammar Nwisgraml. Further, the Verbex will output N, S, E, 
or W in accordance with the voice signals which the user 
trains for those symbols. In other words, the user may 
train and speak these as "North", "South", "East" and 
"west". Finally, note that the grammar will not prevent 
incorrect expressions: one could easily say "120N" for 

latitude or "95S" for longitude. The primary purpose of 
grammar definition is to define what can be said legally. 
Preventing illegal expressions is a "side" benefit which, if 
pursued too far, can cost roo much in complexity, a subject 
of the next section. 

h. Verbex Grammar Editor 

The Verbex 3000 SPADS nas a menu-driven 
facility, called GPIL, for creating grammars which is basic- 
ally user friendly. With GRID, the designer inputs poten- 
tial grammars for an application in V5N form, pushes the 
SPADS "application generate" function key, and sits back. 

The result, if the grammars are not too complex, is a 
complete application less the Pascal control code. In other 
words, SPADS automatically interprets the GRID-built VSN 
file and builds recognition instructions for the Verbex 3000 
telling it when and where to look for acoustic input 
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including long and short silences for the graamars in that 
application. It also creates all the files necessary for 
user voice training and testing of the graoiaars. 

Regarding complexity, SPADS generates a report 
for each grammar which states the numoer of individual words 
in that grammar, the percentage of machine vocabulary 
capacity that total represents, and, most importantly, a 
determination of the grammar’s complexity expressed as a 
percentage of machine capacity. Complexity is based on both 
number of words and number of node transitions in the 
grammar. As this complexity percentage nears 100, the 
Verbex 3000 ability to remain "real time" is reduced 
[Eef. 5: pp. 72-75]. However SPADS generally (in the 
author’s experience) will not generate a grammar with 
complexity higher than 90% for the maximum capacity model 
3000 (four speech processing boards). The formula used by 
Verbex to compute complexity is too cumbersome to describe 
here and is fully explained in [Ref. 8]. However there are 
three important factors in the computation: 

1. Total number of distinct simple objects or words in 
the grammar; 

2. Total number of words that may occur as the first 
word of a legal path through the grammar; 

3. The average length ox all possible paths through the 
gram mar . 

To provide some yardstick measure of complexity for compar- 
ison with other grammars, the report for tne grammar 
Position is; 

Total vocaoulary is 16 words 
Vocabulary is 5% of capacity 
Complexity is 41% of capacity. 

The average path length is the major factor in the 
complexity of the Position grammar. Most of the NWISS gram- 
mars fall in the 40 to 60% range for complexity. 
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HJilSS Grammars 
a. Kwisgrami 

The first grammar from which the Vertex 3000 
attempts to recognize NWISS commands is Nwisgraml. This 
grammar is called in for recognition by the Pascal applica- 
tion code as soon as the previous command has been output to 
NHISS. Its purpose is to allow the user to begin any legal 
NWISS command and then, based on tne recognition result, 
allow the application code to call in the next appropriate 
grammar. The number of ways that one can begin a command 
are numerous: 

FOE <addressee> 

Graphics command 

Other non-graphics command 

"FOR" command without "FOE <addressee>" 

(when directed to same addressee as before) 

Based on trial and error experience, tnese requirements are 
too large to be handled by any single grammar on the Verbex 
3300. Using two or more grammars will not solve the problem 
since at any instant cnly one can be recognized and there is 
no prior knowledge as to w'hich is needed. The only possible 
solution is a compromise with the requirements: 

1. Eliminate the possibility of beginning a "FOR" 
command without the "FOE <addressee>". This seems 
only a slight inconvenience to tne user. 

2. Create a sentinel for graphics commands so that a 
separate grammar can be called in upon recognition of 
the sentinel. This was done using the sentinel word 
"display". Hence the user must say "display" and 
pause for a half-second before entering any graphics 
commands. This is an inconvenience but in practice 
the author had little difficulty in adapting to it. 



32 



3. A third possibility is to divorce "FOH" from 

<addressee> so that the user is required to pause 
after "FOR" while the application loads a grammar 
containing all the Blue force-names. This was tried 
and proved difficult as it violates the rule of 
placing grammar boundaries where natural pauses 
occur. "FOR KITTY" is much more natural than "FOR", 
pause, "KITTY". 

Another design guestion was how to handle the 
"control_f" for executing predefined Blue launch plans. 
Should the user say "control_f" or some more meaningful word 
such as "execute"? The latter alternative was chosen as 
being easier to associate with plans and not being confused 
with the other control character, "control_k", used for 
cancelling commands in mid-stream. This character is quite 
prominent in use in NKISS, has a distinctive acoustic 
pattern, and is shorter than some phrase such as "cancel 
command". Hence "control_lc" is used as it looks. In both 
cases, the application code converts the recognized string 
to the proper ASCII output character for NHISS. 

Similarly, the application coae can make the 
user’s task easier by not requiring "pre" to be said after 
each plan name. Thus the plans are specified as A6STRIKE, 
BCAP1, et cetera and the application code taxes the recog- 
nized plan name and attaches ".PRE" to it as required for 
MWISS. 

The requirements for the Hwisgrami grammar have 
been refined well enough that the grammar can be specified 
as in Figure 2.5. AS noted earlier, objects such as "1.1" 
cannot be entered into GRID. The period is illegal except 
as the first character of a complex object. Hence the 
application code must convert "Ipointi" to "1.1" for output 
to KWIS3. Another application code conversion occurs with 
.aircraft because the Vertex 3000 outputs a space between 
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Nwisgrami ::= FOR .force 
DISPLAY 

EXECUTE .preplan 
DESIGNATE .status 
C ONTHOL_K 
BEARING TRACK 
BEARING FORCE 
FORCE 
TRACK 
POSITION 



sta tus : : = 


FRIENDLY 


. preplan ; : 


= A6STRIKE 




NEUTRAL 




BAIR ASH 




ENSNY 




3CAP1 










BCAP2 




force :;= 


1 poi ntl 




FI 4STRCAP 




. aircraft 










. ship/tase 












. aircraft; : = MP6 


0 


. digit 


shi c/base 


; := KITTY 


3H 1 


0 


. digit 




KNOX 


VAO 


. digit 


. digit 




LCSAN 


VEO 


0 


. digit 




NCCOR 


VE1 


0 


0 




MISAN 


VFO 


.digit 


. digit 




ONAHA 


VHO 


0 


. digit 




RATHE 


VKO 


0 


.digit 




SPRUA 


vso 


0 


. digit 




W ICHI 


VI 0 


0 


.digit 




RILSO 


VNO 


0 


. digit 



Figure 2.5 N»I SS Nwisgraml Grammar. 

each object it recognizes. Hence "MP604" is placed in a 
buffer as "NP6 0 4*' where the program then removes the 
offending spaces prior to output to NWIS3. Too many conver- 
sions such as "1.1"/ ".PEE" and aircraft callsigns add to 
the length and complexity or the application code and poten- 
tially can slow the real time capability of the Vertex 3000. 










i'ilM 














Hence the exfort should be made to place objects in grammars 
exactly as they are to be output wnere possible. The SFADS 
report for Nwisgrami is: 

Total vocabulary is 49 words 
Vocabulary is of capacity 

Complexity is 48^ of capacity. 

Here, by comparison, the complexity is somewhat higher than 
that (4 1fc) for Position, due to the driving factor of total 
number of words. 

b. Force, Track, and Position 



The last three objects of llvisgraml’s top level 
definition are not just objects but also the names of three 
individual grammars. Their appearance in Nwisgrami is 
necessitated by the syntax of the "BEARING" command as shown 
in Figure 2.3. Because FORCE, TRACK and POSITION are used 
often as command keywords and tne size of their respective 
argument domains precludes lumping them together or inside 
some other grammar and the argument can be made that there 
is a natural pause after these keywords but before specifi- 
cation of their arguments, they have been specified as 
individual grammars. Since Position has already been exam- 
ined (see Figure 2.4), the Force and Track grammars will be 
described. 



F crce : : = 1 point 1 

. aircraft 
. ship/base 
. orngbases 
CONTROL K 



. orngcases ::= WONSA 

ALEXS 

PETRO 

VLAD 



The Force grammar is guite similar to the complex object 
.force in Nwisgrami. The only difference is the addition of 
.orngbases recause NKISS allows the Blue player to request 
bearings on Orange bases by name as well as position for 
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convenience. Hence the redundance between Nwisgraml and 
Force is considerable but necessary: one graiamar can not be 
devised to do the jot of both. Tnis grammar overlap has a 
cost in terms of storage space and user training time but is 
unavoidable. The SPAES report for Force is: 

Total vocabulary is 37 words 
Vocabulary is 1 Ifo of capacity 
Complexity is 58" of capacity. 

The number of initial path words (27) is the major factor in 
this complexity figure. 

The Track grammar is relatively small and simple 
but is called auite often: 



Track ::= EAO .digit .digit 
EEO .digit .digit 
EPO .digit .digit 
ESO .digit .digit 
EDO .digit .digit 
CCNTHOL K 



The SPADE report for Track is: 

Total vocabulary is 15 words 
Vocabulary is 5% of capacity 
Complexity is 24% of capacity. 



To see hew the application code comes into play, 
consider the 3EAHING command: when the user says "EEARING 

FORCE", for example, a complete path through Nwisgraml 
exists and the result is placed in a buffer. The applica- 
tion code analyzes the buffer contents and determines that 
the buffer contents should be sent unchanged to NNISS with a 
space after "FORCE", then calls in grammar Force for recog- 
nition, checks the contents again for necessary conversions, 
outputs the converted string with a trailing space, returns 
to Nwisgraml to see what the next xeyword in the command 
will be (a choice of FORCE, TRACK, or POSITION), outputs 
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this keyword followed by a space, calls in the keyword- 
specified grammar, performs any necessary conversions and 
outputs the string followed by a <CE>. 

c. Display 

As previously discussed, Nwisgrami contains tne 
sentinel "DISPLAY" to allow the application control software 



Display : ;= RADIUS .digit+ 

DROP TRACK 
CENTER FORCE 
CENIEH POSITIOM 
PLACE .what .where 
CANCEL .what .where 
PLOT LOB ESN 
PLOT LOB SONAR 
ERASE LOB ESi-1 
ERASE LOB SONAR 
CONT ROL_K 

.what ::= CIRCLE .where ;;= FORCE 

GRID TRACK 

XNARK POSITION 

1 ALL 



Figure 2.6 NWISS Display Grammar. 

to call in the grammar with that name snown in Figure 2.6. 
The SPADS report on Display is; 

Total Vocabulary is 28 words 
Vocabulary is of capacity 
Complexity is of capacity. 
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Airorders 



::= ALTITUDE .digit+ 
3ARRIER 
BINGO 
CON TROL_K 
COURSE .digit+ 

COV EE 

EI-iCON .plan 
FIR E 

MISSION .mission 
OED ERS 

PROCEED POSITION 
REFUEL VKOO . 4digit 
SPEED .digit+ 

TAK E 

WEAPONS FREE .how 
WEAPONS TIGHT 



. mission ; : = A EE 

AIRTANKER 

ASW 

CAP 

SEARCH 

STF.CAP 

STRIKE 

SUECA? 

. how : := AIR 
A 

ENEMY .enemy 

SUBMARINE 

SURFACE 



.plan ;;= AEN 
AIRS 
RADIA 
SILEN 
SONAR 
SURF 



. 4digit : : = 



.enemy 



AIR 

ALL 

SUBMARINE 

SURFACE 



j 



Figure 2.7 NWI SS Airorders Grammar. 
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d. Airorders and Shiporder 

These two grammars are the largest and most 
complex developed for NWISS. The difference of singular vs. 
plural in the names is due to GRID not accepting grammar 
names longer than nine characters. As the names indicate, 
there is sufficient difference between the types of orders 
that aircraft and ships receive to justify individual gram- 
mars for reasons of reduced complexity and increased 
accuracy. Cne of these two grammars is called from 
Nwisgrami every time "FOR <addressee>" is recognized (both 
1.1 and MISAH call Shiporder). Figure 2.7 contains 
Airorders. Here is the only instance of command argum e nt s 
being excluded from a grammar because of lack of use at NPS 
and need for efficiency; there are actually 14 possible 
aircraft missions in NWISS tut only 8 have been included in 
Airorders. 

Figure 2.8 contains Shiporder. Comparison with 
Airorders shows that the two grammars have nine commands in 
common. SPIED is defined differently in Shiporder because 
ship speeds can be narrowly defined. If ships ever go 
faster than 39 knots, the grammar must be changed. The 
SPADS report for Shiporder is: 

Total Vocabulary is 42 words 
Vocabulary is 12?^ of capacity 
Complexity is 58% of capacity. 

€ . La u n ch 

Because of its difficult and lengthy syntax (see 
Figure 2.2), and a large vocabulary requirement due to the 
large number of aircraft types which can be launched and the 
large number of expendables which can oe loaded on the 
aircraft, the LAUNCH command merits its own grammar. The 
launch grammar can only be called from Shiporder when that 



39 



T 



Shiporder ::= BLIP .on/off 
CONTROL_K 
COURSE .digit+ 

DECK .on/o££ 

DEPTH .digits 

EHCON .plan 

FIR E 

LAUNCH 

ORDERS 

PERISCOPE 

PROCEED POSITION 

EBOC .on/off 

SPEED .digit 

SPEED .Sdigit .digit 

STA TICN 

SURFACE 

TAKE 

WEAPONS FREE .now 
WEAPONS TIGHT 



on/cf f 


; ; = ON . 3di git : : = 1 

OFF 2 

3 


. plan ; ; = 


AEN 
AIRS 
F ADIA 


how : = 


AIR 

ALL 




SILEN 

SONAR 




ENEMY .enemy .enemv ; := 

SUBMARINE 

SURFACE 


AIR 

ALL 

SUBMARINE 

SURFACE 


SURF 



Figure 2.8 NWISS Shiporder Grammar. 

cocimand is stated by the user. As noted earlier, only a 
simplified version of the command syntax is supported in 
this application and hence the grammar is smaller than night 
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Launch : : = 


. digit . 


acfttype .acfttype .digit 


1 




. digit [ 


•digit] .wpnsload 






CCKTRCL_ 


K 






LOAD 








PROCEED 


POSITION 






STOP 






. acfttype : : = 


A6E 


.wpnsload ::= HARP 






A 7E 


MK46 A 






S2C 


MK82 






EA3B 


MK8 3 






EA6B 


MK84 






F 14A 


PKENX 






F 14T 


SHRIK 






KA6D 


SPAR 






P3C 


S2538 






S3A 


SSQ47 






SH2F 


SSQ53 






SH3H 


SSQ62 








SWDH 








NALLI 


1 



Figure 2.9 NHISS Launch Grammar. 
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troken into parts. However the application code does not 
tranch to other grammars but simply recalls Fire until the 
command is complete. The SPADS report for Fire is: 

Total vocabulary is 26 words 
Vocabulary is 8% of capacity 
Complexity is 42% of capacity. 
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Fire ::= .digit .cruistype CRUISE 
.digit .torptype TORPEDO 
AT .base 
BEARING .digit + 

RANGE .digit^■ 

CONTROL K 



. cruistvpe ::= TCNHK .torptype ::= ASROC 

HRPON MK46 

HK46A 

.base ::= AIEKS HK4S 

PETRO 
VLAD 
WCNSA 



Figure 2.10 NWISS Fire Grammar. 



C. PASCAL PROCEDURE DESIGN 

Numerous references to the "application code" in 
preceding sections have already indicated what its purposes 
are : 

• to provide for the control of the interactive process 
between the user and the Verbex 3000 

• to control data output to the host process, NNISS. 



43 



Hence calling the correct grammars in seguential order is 
not enough: feedback must be provided the user so that he 
knows the machine's status at all times. In part thrs is 
accomplished automatically by indicator lights on the Vertex 
3000 User's Console. In part it is accomplished by the 
system response of the host process to which the user is 
inputting commands. Finally, with regard to the subject at 
hand, it is also accomplished in part by the visual and 
aural messages which the application program generates to 
the user through the User's Console. (No aural feedback is 
used in the NWISS application). 

Appendix A of this thesis contains the approximately 300 
lines of Pascal code used to control the HWISS continuous 
voice application. This chapter section will describe the 
Verbex predefined functions which appear repetitively 
throughout Appendix A and predefined types and explain the 
reasons underlying the programming techniques used. 

"I ♦ Ver tex P redef ine d F unc tion s and Types 

It was noted in Chapter 1 that Verbex has created a 
library of about 20 predefined functions to ease the 
programmer's task. However the SPADS is still in beta test 
status and many of these functions do not work yet, though 
they are defined. Of those that work, only three are used 
in the NVilSS application and are defined below: 

• H ec cqnize (uram marna m e. buff ername ) is the workhorse 
function. It tells the Verbex 3000 to begin 
listening for acoustic signals matching the named 
grammar and to place the output result in the named 
buffer (of type "string"). Ail Verbex functions 
return a value of type "short" to indicate success or 
failure or some other appropriate result. Recognize 
can return four values: 1) voicein means success 
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2) r eco g nize- tad means failure 3) timeout means no 
input which could be classified as recognizable or 
unrecognizable was received for approximately one 
minute 4) ke y padin means the recognition cycle was 
interrupted by the user from the User’s Console 
keypad, 

2. Hcst write (bu ffernam e) tells the Verbex 3000 to write 
the contents of the named buffer (string) to the host 
computer (NHISS in this case) and simply returns 
success or failure. 

Diso lay m essa geclear { d ispl a y pr imary, tells 

the Verbex 3000 to write the message (of type string) 
to the 32 character display on the User's Console and 
simply returns success or failure. 

Seme other functions were investigated but found not 
to work. This was unfortunate as it made the programming 
task for IIKISS a rather tedious one in terms of comparing 
and manipulating character strings character by character. 
Wordcount (string) , Wor dfind ( stri ng) , Stringcopv (string 1 , 
^rin^2) , and S trin£ComDar.e (strin ql , stri ng! ) are fairly 
descriptive names of functions which are defined in [Ref. 5] 
ana are expected to work with the next SPADS software 
release. Thus the Appendix A software will need a fair 
amount of revamping in order to take advantage of these new 
functions when they are available. 

2- Pro gra mm in g Technig ues 

The job of the voice application programmer is to 
write a Pascal procedure with the name "application". Only 
that name will suffice as the procedure is imbedded by tne 
SPADS compiler into the standard operating software lor the 
Verbex 3000 where it is called at the appropriate time. The 
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NKISS applicatioE procedure in Appendix A uses a high numter 
of labels (35) and a proportionate number of GOTO state- 
rents. This is due in part to the fact tnat it was not 
possible to define functions of the predefined type "string" 
such as will be available from Verbex, and in part to the 
fact that the GOTO statement is efficient and saves one from 
indenting off the right side of the page in a highly nested 
environment which easily results when jumping from grammar 
to grammar. The program is 25,000 bytes long and hence 
close to the Verbex 3000 upper limit of 30,000 bytes. ^ For 
this reason and to allow room for growth, the comments have 
been kept to a minimum but are intended to be adequate for 
the purpose of future updates and maintenance. 



3SPADS Training Course 
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III. OSER»S GUIDE 



To use the NWISS continuous voice application properly, 
one must invest approximately three hours both in learnin^j 
the operation of the Vernex 3000 voice terminal and in 
training voice patterns to tne grammars. This investment 
may be two or three times more than that required to become 
proficient at inputting NWISS commands from the keyboard 
{given that one already has a fair amount of keyboard expe- 
rience) . Nevertheless, it is the author's opinion that the 
investment in voice input will more than pay for itself in 
time saved and reduced aggravation when several lengthy 
NWISS sessions are to be played. The reasons for this 
opinion have already teen stated in several places in this 
thesis and stem from the inherent advantages of voice input: 
naturalness, speed, hands-free and eyes-free (relative to a 
keyboard) input. Further it is difficult to output mistakes 
in the sense that the NWISS grammars only have "correct" 
objects to be output and substitution errors are exceedingly 
rare if a person has taken the necessary time to train voice 
patterns properly. This chapter will move sequentially 
through the steps which a prospective user of the NWISS 
continuous voice application should follow in becoming 
proficient. 

A. LEARNING OPERATICN OF THE 7ERBEX 3000 USER'S CONSOLE 

Verbex has published a very readanle, illustrated oper- 
ating manual which is called the Supervi so r ' s jlanual 
[Ref. 9]. This manual should be sximmed and referred to 
during the user’s first login to the voice terminal. The 
only amplifying instructions are that after the system is 
powered up and completes its self-boot, the user should type 
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nwi ss/nwiss 



on the Vertex 3000 associated VT102 keyboard. This will 
cause the N^-JISS application to be loaded. 

B. EHEOILING THE NWISS VOCABDLARlf 

The first step in training one’s voice patterns on the 
Vertex 3000 is to enroll the entire vocabulary (NHISS vocab- 
ulary is 151 words) at one tine. This means that the Vertex 
3000 will automatically step the user through all 151 words, 
requesting each to be spoken twice, occasionally three 
times, to get an initial set of voice patterns for each 
word. This process should take only fifteen minutes. 

In order to make the enrollment process go smoothly, the 
user snould take time beforehand to look at the vocabulary 
and d^ide wh at cr on unc iati on will be g ive n each word. As 
many of the NWISS words are really just symbols put together 
to make an aircraft type, weapon type, track number, call- 
sign, et cetera, it is important to do this beforehand. See 
Table III for a complete NWISS vocabulary ordered (column by 
column) the same way as the Verbex 3000 will present it. 
Suggested rules of thumb for pronunciation are: 

1 . In general, use the most natural pronunciation which 
comes to mind. 

2. Pronounce numbers which appear as part of fi xed iden- 
tifiers naturally, e.g. "F14A" as "F fourteen A" or 
"bK43" as "mark forty-eight". 

3. Do pronounce digits which appear as part of variable 
strings, e.g. callsigns, track numbers, bearings, et 
cetera, as individual digits, e.g. "ALTITODE 2500" as 
"ALTITUDE two five zero zero". 
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VHO F14STRCAP CANCEL TOMHK MISSION RBOC KA6D 



4. Pronounce N, E, H, and S as "dash”/ ’’north", 

’’east’’, "west " and "south". 

5. Do not use the phonetic alphabet for individual 
letter pronunciation. It is not natural or neces- 
sary. For example, pronounce ’’VA024" as "V A zero 
two four" not "victor alpha ..." 

6. Give the full pronunciation to abbreviated ship 
names, shore base names, and weapon names, e.g. 
"Spruance", "lomahawk", "Harpoon", "Sidewinder", 
"Misawa". ("Kitty" is perfectly acceptable for 
"Kittyhawk") . 

C. ‘IBAINING THE GRAHMAHS 

The next step after enrollment is to train the NHISS 
grammars. This means that the Vertex 3000 will automati- 
cally step the user through a large number of triplets (3 
words in a phrase) . This training can be tedious especially 
with the number of digits used in NWISS, However it is very 
important to accomplish this training properly to get good 
recognition results. (After the enrollment phase, one could 
cnoose to test his or her recognition accuracy on the Vertex 
3000 and would find scores ranging around 50 or 60. After 
the training phase, testing is automatically invoked and 
should show recognition scores in the 90 ’s.) 

To make this training less tedious, the following change 
has been made to the Vertex 3000 scheme of training: the 32 

character display on the User’s Console will ask whether it 
is desired to train 

ALLGRAHHAF.S? 

The only correct response is HO. it will then ask the user 
which grammars to train individually. This is the desired 
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mode. It allows the user to take a break in between grara- 
mars for as long as desired. With "allgranunars" , the 
machine retains no memory of where one leaves the ordered 
list of triplets and, in essence, a marathon training 
session is implied. For this reason, and to conserve space, 
the NWISS "allgrammars" is merely a shell with but a few 
entries to please SPADS' expectations. Training the gram- 
mars individually should take about 15 or 20 minutes each, 
depending on size. The Digits grammar is last on the list 
and may be left untrained as the user will get to train many 
digits in the other grammars. However if digits ever seem a 
problem, then it may be worthwhile to train Digits as well 
as the other grammars. 

D. TESTING 

After each grammar has been trained, the Verbex 3000 
will ask the user if testing is desired. This is a worth- 
while twc-minute exercise in which the Verbex 3000 displays 
complete legal paths (not just triplets) through the grammar 
and, after the user has spoken each, displays the recogni- 
tion score for that utterance. Scores should generally be 
in the 90's with a few 80’s. Scores in the 70’s and below 
may indicate retraining is needed. 

However complete paths through grammars are not complete 
NWISS commands. Users should test their "feel" for grammar 
boundaries, where pauses are required, by testing on NWISS 
itself (after training all grammars and prior to beginning 
operation). Fig'ire 3.1 contains a fairly representative 
sample of NWISS commands which the user should attempt. 
Pauses are indicated by "..." and prompts are inside 
paren theses . 
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Figure 3. 1 NWISS Test Commands 



E. CPERATIONiL USE 

After the enrollment and training phases are over, the 
interesting phase begins: actual input to IJWISS. What 
follows are a few suggestions to make this phase easier and 
hopefully trouble-free. In general one should enter the 
NWISS commands in accordance with the display messages 
(which appear on the User's Console every time the Verbex 
3000 must leave one grammar and call in another) and with 
the "feel” for those grammars obtained from training the 
triplets. Spe ech is conti nuous within a g rammar but 
di sc re te acro ss gram mar bou ndarie s. A few guidelines are: 

1. A pause is always required after saying "FOR 
<addressee>" , "DISPLAY", "FORCE, "TRACK", or 
"POSITION". (Wait for the appropriate display prompt 
before continuing). 

2. When speaking a field of digits, prepare ahead of 
tine what they are and speak them continuously 
without pausing in the middle. However this is not 
true of positions (latitude and longitude) , aircraft 
callsigns, or track numbers (i.a. FORCE, TRACK, and 
POSITION) which are defined as fixed length fields 
and may be entered as discretely or continuously as 
desired . 

3. Simple commands which have only one argument cf 
digits should be spoken continuously, e. g. "SPEED 
35", "RADIOS 250", or "ALTITUDE 2000". 

4. For the more difficult commands (i.e. FIRE, LAUNCH, 
BARRIER, and STATION) the command keyword itself 
serves as an entry point to other grammars and appli- 
cation code and hence a pause is required after it. 
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5. Under "LAUNCH’', <event-name> must be specified as the 
aircraft type plus a single digit, e.g. "F14A1" cr 
"E3C2". 

6. Conversation with other players can easily trigger 
recognition in the V erbex 3300 and cause unwanted 
output to NWISS. This can be prevented by swinging 
the headset microphone away and covering it with the 
hand or pressing the "STOP" button on the User's 
Console. This latter method is most effective as it 
stops the Verbex 3000 from listening and is easy to 
clear: simply press the "YES" key in response to the 
"CONTINUE?" message on the display. 

7. Don't panic if the above happens. "CONTROL_K" can be 
issued from anywhere and will return the process back 
to Nwisgrami ("ENTER NKISS COMMAND PLEASE" is 
displayed) . 
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IV. CONCIOSIONS AND RECOMMENDATIONS 
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character/ as from a keyboard/ but instead "buffer" and 
parse the entire command; the user can then "send" the 
buffer if it is satisfactory or cancel it. Prompting can be 
adequately provided by the Verbex 3000 in such a setup. 

Some "requirements" have not been met. In particular 
the ability to specify the "TIME" of a command is not avail- 
able. This is the result of a design judgment that it would 
be too difficult to provide and is seldom used. Another 
lacx is the ability to get help (keyboard "?") at any place 
in a command. The problem here is that to get the desired 
effect/ every grammar must have a multitude of legal paths 
defined which end in "?". However/ with careful study/ one 
might be able to redefine some of the HHISS grammars to 
allow "help" to be spoken in the middle of a command where 
it might most be needed. Here is a situation where perhaps 
NfJISS modification/ such as having a separate "HELP" command 
whereby one would specify the command and/or command argu- 
ment where help is needed, might be easier to accomplish. 

One or two additional grammars would be required on the 
Verbex 3000 but there is room for that. Creation of such 
grammars would also facilitate implementing the "CANCEL" 
command of NWISS which is not implemented currently. 

There will be some who criticize the grammar boundaries 
as either being misplaced or just too "discrete". Either 
criticism cculd well be valid: misplacements can be 

corrected to some extent but the length of pauses for 
grammar boundaries will be more difficult. Only faster 
processors and faster/ larger memories can solve this 
problem. The Verbex 3000 represents tne state of the art 
(commercially) today in terms of affordable continuous voice 
recognition technology. 

Sometime in the foreseeanle future men will talk natu- 
rally to machines and machines will talk back in clear/ 
understandable prose. Obviously tne NNISS continuous voice 
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application is far removed 
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